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We introduce a highly-parallelizable architecture for estimating parameters of compact binary 
coalescence using gravitational-wave data and waveform models. Using a spherical harmonic mode 
decomposition, the waveform is expressed as a sum over modes that depend on the intrinsic parame¬ 
ters (e.g. masses) with coefficients that depend on the observer dependent extrinsic parameters (e.g. 
distance, sky position). The data is then prehltered against those modes, at hxed intrinsic param¬ 
eters, enabling efficiently evaluation of the likelihood for generic source positions and orientations, 
independent of waveform length or generation time. We efficiently parallelize our intrinsic space 
calculation by integrating over all extrinsic parameters using a Monte Carlo integration strategy. 

Since the waveform generation and preffitering happens only once, the cost of integration dominates 
the procedure. Also, we operate hierarchically, using information from existing gravitational-wave 
searches to identify the regions of parameter space to emphasize in our sampling. As proof of con¬ 
cept and verification of the result, we have implemented this algorithm using standard time-domain 
waveforms, processing each event in less than one hour on recent computing hardware. For most 
events we evaluate the marginalized likelihood (evidence) with statistical errors of < 5%, and even 
smaller in many cases. With a bounded runtime independent of the waveform model starting fre¬ 
quency, a nearly-unchanged strategy could estimate NS-NS parameters in the 2018 advanced LIGO 
era. Our algorithm is usable with any noise curve and existing time-domain model at any mass, 
including some waveforms which are computationally costly to evolve. 


I. INTRODUCTION 

The upcoming ground based gravitational-wave detec¬ 
tor network (notably including advanced LIGO [1] and 
advanced Virgo 0) are sensitive to the gravitational- 
wave signal from coalescing compact binaries, both the 
relatively well understood signal from the earlier inspi¬ 
ral phase UMl and the less well understood strong- 
held merger [T5]-[22]. Both of these regimes are impor¬ 
tant in understanding the underlying physical processes 
and properties of the binary which likely provide the 
central engine for phenomena such as short gamma-ray 
bursts [23H25] . A multimessenger observation of this cat¬ 
egory of event would be the hrst of its kind and could 
answer open questions about these poorly understood 
events. In the era of multimessenger astronomy, rapid 
and robust measurements of candidate compact binary 
gravitational-wave events will be a critical science prod¬ 
uct for gravitational-wave observatories, as colleagues 
with other instruments perform followup and coincident 
observations [26]. The most tantalizing proposed elec- 
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tromagnetic counterparts are expected to be brief, po¬ 
tentially disappearing within days if not shorter [27H33] . 
Given limited resources, reliable low-latency parameter 
estimation of gravitational-wave signals will signihcantly 
enhance the science output of multimessener astronomy. 

With the scheduled resumption of data taking in late 
2015 [26], the second generation gravitational-wave in¬ 
terferometers in Hanford, Washington, and Livingston, 
Louisiana, are expected to reach unprecedented sensitiv¬ 
ities [34]. The Virgo detector is also expected to resume 
data taking within a year of this milestone. In prepara¬ 
tion for the next run, the LIGO and Virgo Collaborations 
have implemented and extensively tested a set of low 
latency gravitational-wave detection pipelines [33] [35L 
[37], capable of compact binary event detection within 
a few minutes (or less) from the coalescence time. These 
pipelines trade the ability to accurately determine all but 
a few of the parameters of the coalescence for speed and 
breadth of analysis. 

Having a more accurate estimation of the bi¬ 
nary coalescence’s parameters is valuable not only to 
gravitational-wave science but also to electromagnetic 
observatories to guide their pointing. Many astrophys- 
ical phenomena which could create transient gravita¬ 
tional waves also have electromagnetic signatures which 
may decay rapidly. Moreover, gravitational-wave inter¬ 
ferometer networks often cannot localize gravitational- 




2 


wave events to better than a few hundred square degrees 
on the sky [26l [33l [38] , implying follow-up observations 
should occur expeditiously after the initial identification 
to promptly localize as accurately as possible. While sev¬ 
eral Bayesian algorithms for gravitational-wave parame¬ 
ter estimation have been employed in the past, the time 
scale of a full parameter estimation analysis remains at 
a much higher latency than the initial detection. 

With these goals in mind, and moving hierarchically 
from the broader search result, fast follow-ups have been 
developed to rapidly localize the event in the sky, assum¬ 
ing the search pipelines have provided accurate timing 
and masses [33l|39]. However, the most robust interpre¬ 
tation of gravitational-wave data requires systematically 
comparing all possible candidate signals to the data, con¬ 
structing a Bayesian posterior probability distribution for 
candidate binary parameters [4Qf[47] . These schemes pro¬ 
vide measurements of all the physical parameters of the 
coalescence, albeit at a much higher latency than the 
search itself. Owing to the complexity and multimodality 
of the model space, these strategies have adapted vari¬ 
ants of Markov Chain Monte Carlo (MCMC) or nested 
sampling @ 71118 ] algorithms to estimate the parameters 
of the coalescence. In physics, similar path-based meth¬ 
ods have been enormously successful at a broad range 
of physical problems, by exploring all possible paths 
through a configuration space; see, e.g. [Hilo]. 

Though successful, these nature of these algorithms is 
functionally serial, with moderate degrees of paralleliza¬ 
tion requiring intensive communication to coordinate the 
current state. Without state-of-the-art techniques like 
ensemble sampling, MCMC and nested sampling tech¬ 
niques therefore do not scale efficiently to beyond a few 
tens of cores (e.g., for parallel tempering). Convergence 
of these algorithms is also limited by ergodicity. No theo¬ 
rem guarantees they must explore the entire model space, 
let alone efficiently; no expression can robustly assess 
convergence, using available sampled data. In contrast, 
efficient and highly-parallelizable Monte Carlo integra¬ 
tion strategies are frequently applied to problems with 
dimensions comparable or higher than coalescing com¬ 
pact binaries; see, e.g., [5T]-[54]. In this work, we apply 
such methods to gravitational-wave parameter estima¬ 
tion for the first time. 

This work is organized as follows. In Sec. |TI|we pro¬ 
vide an executive summary, outlining the principles that 
enable our algorithm to provide rapid, accurate results 
for the test cases explored here: time-domain models for 
non-spinning, circular binaries. Then, in Sec. |III| and [rV| 
we outline the pertinent features of our waveform decom¬ 
position and describe our algorithm in detail. To demon¬ 
strate our algorithm provides high performance in an en¬ 
vironment mimicking conditions during an observational 
run. Sec. |V| presents results drawn from a large sample 
of events from the “2015 double neutron star mock data 
challenge,” results from which are described in [33] • 

Sec. |VI[ we discuss the broader significance of our result 
in the context of other parameter estimation work inside 


and outside the community. 


II. EXECUTIVE SUMMARY 

In this paper, we outline an alternative architecture for 
gravitational-wave parameter estimation, based on three 
key ingredients: efficient evaluation of the likelihood as 
a function of extrinsic parameters, highly-parallelized 
Monte Carlo integration over a grid of intrinsic param¬ 
eters, and input derived from gravitational-wave search 
and sky localization pipelines. 

The computational cost to evaluate the likelihood has 
historically been a limiting step in gravitational-wave pa¬ 
rameter estimation. This cost has two factors in each 
comparison which is required to be made. The first is 
the cost of waveform generation, as discussed in [55ll59] 
and references therein. Second is the filtering cost related 
to the required length of the waveform to be compared 
— this scales in proportion to the lowest frequency ac¬ 
cessible by a gravitational-wave interferometer. At fixed 
sampling rates, the lowest frequency content of the wave¬ 
form dominates a time-domain likelihood computation. 
Recently, several methods have been proposed to perform 
this comparison more efficiently [siHsaEni, by interpo¬ 
lating some combination of the waveform or likelihood or 
by adopting a sparse representation to reduce the com¬ 
putational cost of data handling. In this work, we intro¬ 
duce a robust and straightforward scheme to reduce the 
cost per comparison, without resorting to interpolation 
within or manipulation of the waveform family. We first 
perform a “precomputation” phase for each intrinsic pa¬ 
rameter by decomposing each physically distinct source 
in the spin-weighted spherical harmonic basis. This al¬ 
lows us to quickly evaluate the likelihood as a function 
of extrinsic parameters. This method is applicable to 
any available waveform model, and the computational 
cost of the precomputation step requires the waveform to 
be generated and decomposed only once, thereby mak¬ 
ing this step a small fractional cost of the total com¬ 
putation. To our knowledge, this work is the first time 
any such method has been implemented for the most ac¬ 
curate but computationally-expensive waveform models 
like EOB [61] [62] while leveraging several hundred com¬ 
puting cores at once. 

Taking advantage of the fast likelihood computation, 
each set of intrinsic parameters has a straightforward 
and vectorizable Monte Carlo integration performed to 
explore and marginalize over the extrinsic parameters. 
Monte Carlo integration can provide an exhaustive search 
of the parameter space with a fixed cost in convergence 
scaling with the number of points drawn. As each Monte 
Carlo draw is a statistically independent sample, this pro¬ 
cedure can engage as many computational resources as 
are necessary to provide fast convergence of the integral 
to a satisfactory level of uncertainty. Efficient Monte 
Carlo integration schemes adjust their sampling strategy 
while performing the integration by adapting to sample 
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only the meaningful regions of the space. By using de¬ 
tection search results (e.g. those produced by matched- 
filtered searches [63]) to sample near maxima of the like¬ 
lihood, Monte Carlo methods can retain generality but 
still be extremely efficient. We describe a procedure to 
explore the extrinsic parameter space and evaluate the 
marginalized likelihood that is embarrassingly parallel, 
extensibly incorporates information provided to acceler¬ 
ate convergence, and provides a well-understood sam¬ 
pling error estimate, allowing us to target a given level 
of precision. 


To complete the parameter estimation process, our 
algorithm evaluates the likelihood and accumulates in¬ 
trinsic posterior samples using a well-motivated discrete 
grid whose position and size is derived from the exist¬ 
ing information provided by gravitational-wave detection 
pipelines. To further accelerate convergence, we also use 
the information provided by the search to prioritize spe¬ 
cific combinations of extrinsic parameters for further in¬ 
vestigation. Like the process of marginalizing the like¬ 
lihood at fixed intrinsic parameters, this grid search is 
embarrassingly parallel. Typically, parameter estima¬ 
tion strategies make almost no use of the information 
reported by existing search codes, precisely to avoid self- 
consistency issues that can arise by, for example, using 
inferences for the data as priors on the reanalysis of that 
data. In contrast, by carefully distinguishing between 
sampling and integration priors, we circumvent these is¬ 
sues. 


To summarize, combining these three factors (paral- 
lelizable Monte Carlo integration, efficient likelihoods, 
and search information), we can provide reliable param¬ 
eter estimates for merging double neutron star binaries 
within one hour on existing hardware for instruments 
available in the next five years. This promising and 
simple first implementation has performance comparable 
to or better than well-developed state-of-the-art Markov 
Chain Monte Carlo codes like lalinference applied 
to initial detector data with comparable nonprecessing 
sources. For advanced instruments, with longer signals 
and more costly waveforms, our method should perform 
significantly better than the current LALINEERENCE im¬ 
plementation with state-of-the-art nonprecessing bina¬ 
ries. Our method therefore provides a valuable alterna¬ 
tive to other promising methods like reduced-order mod¬ 
eling, ensemble sampling, and other tactics to improve 
scaling and decrease latency |64l |65] . Owing to its supe¬ 
rior scaling and transparency, our method provides a well 
understood and easily implemented alternative to LALIN¬ 
EERENCE for suitable problems, particularly valuable for 
low-latency and approximate parameter estimation and 
for investigations into waveform systematics. 


III. BINARY WAVEFORMS 
A. Intrinsic and extrinsic parameters 

On physical grounds, we group waveform parameters 
ft = (A, 0) into two classes: the intrinsic parameters (A) 
and extrinsic parameters (0). The intrinsic parameters 
are fundamental to the description of the binary: if we 
change any intrinsic parameters we must recompute the 
orbital dynamics of the binary (typically through the rel¬ 
atively expensive process of numerically integrating ordi¬ 
nary differential equations). Extrinsic parameters sim¬ 
ply describe how the binary is oriented in space and time 
relative to the detector; changing extrinsic parameters 
involves a relatively inexpensive rotation, translation or 
rescaling transformation. As we will show in the next 
subsection, for the non-spinning case considered here the 
intrinsic parameters (transformations of the binary com¬ 
ponent masses mi and m 2 ) are 

X = {Mc,rj}, ( 1 ) 

where Me = (mim 2 )^/^/(mi + 7712 )^/^ is the chirp mass 
and T] = mim 2 l{mi + 7772 )^ is the symmetric mass ratio. 
The extrinsic parameters are 

^ ~ {^geo5 <^5 ^5 ^5-^5 0c} 5 (2) 

where tgeo is the time at which the coalescence point of 
the waveform arrives at the Earth geo center, a and 6 are 
the right ascension and declination, l is the inclination 
angle of the binary’s angular momentum vector and the 
line of sight to Earth, D is the luminosity distance to the 
binary^, 0 is the polarization angle, and (^c is the orbital 
phase of the binary at coalescence. 

B. Waveform decomposition 

The gravitational wave strain measured by the in¬ 
terferometric detector in a network is given by 

hk{t) = ^ Fx,k{^,a,^p)hx,k{t) , (3) 

where Fx,k are the antenna patterns of the detec¬ 
tor^ and hx,k are the two components of the grav¬ 

itational wave strain, evaluated at the detector. The 


^ For the sources considered in this paper, the redshift correction 
is assumed to be negligible. 

^ Due to the rotation of the Earth, the antenna patterns change 
as a function of time. While this has been mostly neglected due 
to the signal duration versus the Earth’s rotational velocity, an 
accounting of this effect will become necessary as the instruments 
become sensitive to longer binary coalescence waveforms. 
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antenna patterns depend only on the extrinsic sky loca¬ 
tion and polarization angle, while the polarizations de¬ 
pend on both intrinsic and extrinsic parameters. Mean¬ 
while, at leading order for inspiral-only waveforms the 
polarizations are described 4>(t) the orbital phase, and 
the post-Newtonian v{t) “velocity” parameters, these de¬ 
pending only on combinations of the masses of the binary. 
Thus, it is convenient to introduce h{t\A4c,V^ 4^c) given 
by: 


h+,k{t) = h{t -tk\Mc,r],i,(t)c) ■ (4) 

In this expression, tk denotes the time of arrival of the 
coalescence at the detector. 


^ Xk-N{a,S) 

t/c — tgeo ^ 

where Xk is a vector pointing from the geocenter to the 
detector and N{a, 6) is the direction of gravitational- 
wave propagation. So, each member of the network will 
have an offset relative to the geocenter time depending 
only on sky location. 

At this stage, we make no assumptions about the func¬ 
tional form of h(t|AIc 5 0c)- We can use a —2 spin- 
weighted spherical harmonic mode decomposition (de¬ 
noted him) to further separate intrinsic and extrinsic pa¬ 
rameters appearing in the polarizations as 


h+,k{t) -ihx^kjt) = (6) 

Im 


evaluated at some fixed distance I^ref (in this work we then we can re-express the measured strain in the k^^ 
choose = 100 Mpc). detector as 

If we define a complex-valued antenna pattern for each 
detector as 


Fk = F+^k + iFx,k , (7) 


hk(X, 0] t) = Re Fk{a, S, 'ip) him{Mc, r], tk\t) -<t>c) ■ 


D 


Im 


(8) 


Aside from t/c, we have now completely separated the 
intrinsic parameters (which enter only the him) from 
the extrinsic parameters (which enter only the Ff^ and 

Given a time-domain representation of the gravita¬ 
tional wave strain, we can define a frequency-domain ver¬ 
sion of this strain via a Fourier transform 


/ oo 

h(t)e-^^^7Ut . (9) 

-C» 

Time translation of frequency-domain waveforms is triv¬ 
ial. If h{tk] f) is the Fourier-domain representation of 
a strain for some arrival time then the same strain 
arriving at another time can be simply related by 




Thus, if we work with frequency-domain waveforms, the 
arrival time can be factored out as exp(—27ri/t/e) and 
we can complete the separation of intrinsic and extrinsic 
parameters. 


For the explicit decomposition above, we have focused 
on non-spinning waveforms and found that they can be 
separated into two intrinsic parameters and seven extrin¬ 
sic parameters. In the general case, which could include 
precession, tidal effects and any other physics, this in¬ 
trinsic and extrinsic separation is still possible. In fact, 
there will always be seven extrinsic parameters which en¬ 
ter only through inexpensive geometric factors, while any 
additional parameters will always be encoded in the ex¬ 
pensive him modes. 


To see that this is true, first note that Eq. (10) holds 
for an arbitrary strain and so can always be used to fac¬ 
tor out the dependence on time of arrival. Similarly, a 
gravitational wave far from its source will always fall off 
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as 1/D. Furthermore, the antenna patterns F+ and 
depend on the detector geometry, not the source, and 
so are unchanged. This gives a total of five extrinsic pa¬ 
rameters that in general can be factored out exactly as in 
our non-spinning waveform. Lastly, we have two angles 
which enter into the The physical interpreta¬ 

tion of these angles depends on the choice of the frame 
relative to which the harmonic mode decomposition is 
performed. One should choose a frame which is conve¬ 
nient for expressing and computing the him- For the 
non-spinning case, this means aligning the frame with 
the orbital angular momentum, L, in which case one can 
show that the zenith angle is t and the azimuth is —(pc- 
In the precessing case, one would most likely use a frame 
aligned with the total angular momentum, J, e.g. using 
the parameterization described in [66] . In the notation of 
that paper, the extrinsic spherical harmonic dependence 
is Y^~‘^\m{^JN^—pjL)^ where Ojn is the inclination of 
the total angular momentum to line of sight, and pjL 
marks the azimuthal position of L on its precession cone 
about J. 


IV. METHODS 

A set of data dk (t) collected from a gravitational-wave 
detector is typically decomposed into the inherent noise 
of the detector and a putative gravitational-wave signal: 

dk{t) = hk{t) + nk{t) . (11) 

In the absence of a signal and under the assumption that 
each detector produces stationary Gaussian noise, the 
noise nk{f) in the detector is characterized by its 
power spectrum Sk{f)- 


A. Bayesian evidence and posteriors 

If the detector noise is Gaussian, it follows that the 
probability of some set of noise realizations in each of 
our detectors is 


p{{d}\no) cx JJexp ( 15 ) 

k ^ ^ 

in the absence of a gravitational-wave signal. Here, 
indicates the hypothesis that the data is only Gaussian 
noise. 

With a gravitational-wave signal present, the data 
from each detector is noise plus the response of an in¬ 
terferometer to the gravitational-wave h(/i). In this case, 
the probability of some measured d given the presence of 
the signal with parameter values jl is 

(16) 

where l-Li indicates the hypothesis that the data consists 
of Gaussian noise plus a gravitational-wave signal. 

By Bayes’ theorem, the posterior probability of the 
parameters jl under the signal present hypothesis 'Hi is 


where 


p{il,'Hi\{d}) 


p{{d}\il,'Hi)p{p) 

p{{d]\Hi) 


(17) 


p{{d]\ni) = Jp{{d}\il,Hi)p{il)dil. (18) 

It is convenient, at this stage, to introduce the likelihood 
ratio 


{nk{frnk{f)) = lSk{m{f-f). ( 12 ) 

In the following discussion, we define a weighted 
inner-product of two complex Fourier-domain functions 
with a weighting function 1/S{f): 


cmd}) = ]i 

k 


eyjp {-{d - h{il)\d-h{p))k/2} 
exp{-(d|d)fe/2} 


and to rewrite Eq. 0 in the form 


(19) 



S{f) 


(13) 


p{p,ni\{d}) 

where Z is defined to be 


^P\{d})p{p) 

Z 


( 20 ) 


The mode decomposed waveforms him^ can be shifted in 
time in reference to a detector relative to the geocentric 
time via Eq. ( [Tq| ). Eor this reason, the overlap between 
a single detector data time series d{t) and a time-shifted 
complex function h{t — tk) is 


{hitk)\d) = 21“ . (14) 

Note that this is simply the Eourier transform of the in¬ 
tegrand in Eq. (13), which means we can compute the 


overlap for all possible time shifts with a single Eourier 
transform. 


z = Jcmd^pipw s ^ ( 21 ) 

We can also compute the posterior probability density 
(which we will abbreviate as simply the posterior) for 
one or multiple parameters (marginalized over all other 
parameters). Let x be one or more parameters in jl and 
y be all other parameters, such that jl = x U y. Then, 
the posterior for x is 

Pi^Hilx) = j dy p{y\ni)C{x,y) . (22) 
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B. Efficiently evaluating the likelihood 

We now show how the waveform decomposition de¬ 
rived above can be exploited to speed up evaluations of 
the likelihood ratio. In Eq. note that we have taken 
the observed strain in a detector and rewritten it 

as a linear combination of harmonic mode time series 
The harmonic mode time series depend on the 
intrinsic parameters, but the extrinsic parameters (apart 
from tk) are entirely encoded in the coefficients of the 
linear combination. Because computing the likelihood is 
just an inner product, which is a linear operator, we can 
pull these coefficients outside the inner product integral. 
Thus, we need only compute inner products involving 
the him and data {d}. If we store these, we can compute 
the likelihood for any extrinsic parameters by simply re¬ 
computing the coefficients and reconstructing the linear 
combination. 

To that end, we define the following quantities: 


Note that each Q/c,zm(A, t/.) is computed for all tk with a 
single inverse Fourier transform, as in Eq. (14). A signal 
will produce a peak in the filtered outputs localized to 
a short millisecond time window around the coalescence 
time, so the Q/c,zm(A, will be sharply peaked as func¬ 
tions of tk and we need only retain the values for a narrow 
range of tk- To allow for detector arrival times that dif¬ 
fer from the geocenter time, the range of tk for which we 
must store the Qk,im is set by the light travel time across 
Earth {2R^/c 42ms). Conservatively, work we store 

the Qk,im for a much longer 300 ms range of tk- The 

Uk,im,i'm'W and Vk,im,i'm'W are independent of tk and 
are computed once by a straightforward use of Eq. (13). 


Qk,lm{^Rk^ — \ ^/c) 1^ 


= 2 


(23a) 




(23b) 

(23c) 


By plugging Eq. <§ into Eq. ( [l^ , taking a log and 
collecting terms, we obtain 


ln£(A;^) 


{Dr,f/D)Re 

Qk,lm (X tk) 

k Im 


{Drei/D)^ 

4 


k Iml'm' 


*y(-2) 


Importantly, the intrinsic parameters A enter only 
through the Qk.im, Uk,im,i'm' and These are 

the dominant cost, as they require computing the orbital 
dynamics, the inner product integrals and inverse 
Fourier transforms. By contrast, the extrinsic parame¬ 
ters enter the Fk and which are much cheaper 

to compute. 


Therefore, if we fix a point in the intrinsic parameter 
space we can compute the Qk,im{\tk), Uk,im,i'm'W and 
Vk,im,i'm' (A) only once, vary the extrinsic parameters and 
compute the likelihood for only the cost of the Fk and 
This allows us to efficiently integrate over the 
extrinsic parameters and obtain a marginalized posterior 
for the intrinsic parameters p{X) as in Eq. (22). If we do 
this for a collection of points in the intrinsic parameter 


Z'm' (^)Tl4e ^Fj^Y^ 




I'm' 



(24) 


r 


space, we can integrate over A as well and obtain Z as 
in Eq. (21). Note that the computation for each point in 


the intrinsic space is completely independent of the oth¬ 
ers. This makes the algorithm embarrassingly parallel 
and given enough CPU cores the entire analysis can be 
run in the time it takes to integrate over the extrinsic pa¬ 
rameters (modulo some brief startup and post-processing 
steps). 

The remainder of this section provides more details on 
the various steps of our algorithm. 


C. Placement over intrinsic parameters 

In the current work, we restrict ourselves to consider non¬ 
spinning binaries in which we also neglect tidal effects. 
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FIG. 1: We use an effective Fisher matrix to compute an 
approximate ellipsoidal region of overlap > 90% with the 
masses reported by a detection pipeline. We then fill this 
ellipsoid with discrete points and cut any with unphysical 
values of symmetric mass ratio rf. In this case mi — l.SM© 
and m 2 — l.SSM©, so a little under half of the points placed 
would have been unphysical. At each physical grid point, 
we marginalize the likelihood over all extrinsic parameters as 
described in Sec. 


Therefore, we need only two mass parameters to describe 
the intrinsic parameter space, for which we use the sym¬ 
metric mass ratio rj = mim 2 /M^ and the chirp mass 
A4c = (where M = mi + m 2 is the total mass). 

Since detection searches will report masses for the can¬ 
didate event, we use these to guide which region of the 
intrinsic parameter space to explore. 

Let (AT*, 77 *) be the masses reported by a detection 
pipeline. We then perform an effective Fisher matrix 
calculation as described in miEH] centered about this 
point. This involves evaluating the overlap 


l(fe(V)|fe(A))| 

^ {h{X*)\h{X*)){h{X)\h{X)) 


(25) 


between our waveform with masses (AT*, 77 *) and approx¬ 
imately tens of nearby waveforms with different intrin¬ 
sic parameters (while extrinsic parameters are held con¬ 
stant). The measured overlap values are then fit with 
a multi-dimensional quadratic. The coefficients of this 
quadratic fit are called the effective Fisher matrix. Like 
the standard Fisher matrix, the effective Fisher matrix 
serves as a quick, crude estimate of expected parame¬ 
ter estimation performance and can be used to predict 
surfaces of constant overlap, which will in general be el¬ 
lipsoids. In this work, we use the effective Fisher matrix 
to approximate the region of intrinsic parameter space 
which will have overlap > 90% with the masses (AT*, 77 *). 

Once we have defined this 90% overlap ellipsoid, we 
must fill it with a set of discrete points at which we will 
compute the likelihood. To do this, we first specify the to¬ 
tal number of intrinsic parameter points we wish to place, 
here 200 points. Then, these points are arranged within 


a unit sphere. In this work, we placed points along 20 
radial lines, with 10 points per spoke. Along each spoke, 
the points are placed uniformly in radial distance. Now, 
the eigenvalues and eigenvectors of the effective Fisher 
matrix tell us the lengths and orientations of the axes of 
the 90% overlap ellipsoid. We use these to deform and ro¬ 
tate our set of points in the sphere to a set of points in the 
90% overlap ellipsoid. Once we have filled our ellipsoid, 
we remove any points that have unphysical 77 (> 1/4). 
We ensure that we always place spokes in our ellipsoid 
along the direction of constant 77 , so that we always have 
many points along this boundary of the parameter space. 
One reason for the choice of spoked placement is so that 
we can always ensure near-equal mass binaries will have 
many points near the 77 = 0.25 boundary of the param¬ 
eter space, the region where most astrophysical binary 
neutron stars are expected. Fig. illustrates this place¬ 
ment of intrinsic parameter points. 


D. Integrating over extrinsic parameters 

Because precomputed quantities allow us to efficiently 
evaluate the likelihood £(A, 0) as a function of 6 >, we first 
integrate the likelihood C{\^0) over all extrinsic param¬ 
eters 0 to get 


Cred{x) = J c{x,e)p{0)de, (26) 

where p{0) is our prior over the extrinsic parameters. We 
assume the sources analyzed are randomly-oriented and 
randomly distributed in the universe out to a fiducial ra¬ 
dius^. With the potential exception of the sky position, 
our priors are independent, and thus separable. We then 
evaluate the reduced likelihood >Cred by integrating over 
the geocentric time of arrival t, distance D, sky position 
ftsky represented as right ascension a and declination 6, 
angular momentum orientation as measured by inclina¬ 
tion L, polarization angle 7 /, and coalescence phase 0 c 5 
using the separable prior to get 


>Cred(A) = 

f dt D^dDdQsky d{cosL)d(j)cdip 

(27) 

In this expression and our calculations, we adopt a max¬ 
imum distance Dmax (here 300 Mpc, beyond which we 
do not expect appreciable sensitivity from a 2015 era in¬ 
terferometer) and a time window T^indow (here, 300 ms) 
surrounding the event. 


^ Beyond which we do not expect our instruments to be apprecia¬ 
bly sensitive in the next two years. 















With the exception of time (described below), we eval¬ 
uate these integrals and reconstruct the posterior distri¬ 
bution using Monte Carlo integration [69l ITO] . If Ps (^) is 
a distribution which is never zero, then 

£red(A) = / • ( 28 ) 

J Ps{ 0 ) 

If we draw N random samples Oq from ps , we can estimate 
the value of the integral jC^ed and its error cycled follows. 
Let 


, _c{\e,)p 0 <,) 

Wq - 

Ps{e,) 

for a given draw g of 6>, then 


(29) 


q 

4red = • (31) 

The weighted samples also provide an estimate of the 
marginalized one-dimensional cumulative distributions 
P{< x) at fixed A: 


P{< x) = ^^'^WqQix -Xq), (32) 

q 

where x is one of the extrinsic variables in and 0 is 
the Heaviside step function. This formula follows by per¬ 
forming Monte Carlo integration on the parameter vol¬ 
ume < X, keeping track of overall normalization. In the 
limit of many samples, this discontinuous estimate should 
converge to a smooth, marginalized posterior distribu¬ 
tion. For any set of samples and any x, the error in 
P follows from the (correlated) statistics of the Monte 
Carlo integrals in its numerator and denominator. In the 
typical case that all samples Xq are distinct, the unique 
sample with the largest weight corresponds to the largest 
discontinuity in P. The magnitude of this discontinu¬ 
ity, or equivalently its inverse rieff, provides a practical 
measure of how reliable we expect this one-dimensional 
posterior to be: 


_ q 

^eff - r 

max|req| 


(33) 


Equivalently, the “effective number of samples” neff mea¬ 
sures how many independent samples produce similar 
weights near the largest observed weight. 

Following the discussion of priors earlier in this sec¬ 
tion, unless otherwise indicated, we draw samples using 


a separable sampling distribution Ps{0) = 

Each factor ps^a is equal to the corresponding prior in 
that dimension. 

Furthermore, being a pure Monte Carlo integral, all the 
draws are independent. We can simply average the re¬ 
sults of multiple instances to achieve a smaller , even 
if these evaluations adopted to a different sampling distri¬ 
bution. Currently, we perform ntriais (= 10) evaluations 
of the integral at each fixed intrinsic point, terminating 
when either N iterations crosses a threshold (= 10^) or 
to some fixed neff threshold (= 1000), whichever comes 
first. This approach is thus highly parallelizable: simply 
instantiate instances (with different seeds) of the desired 
integral across however many computing resources are 
required for the target execution time and precision, ef¬ 
fectively dividing the time required to converge by the 
number of processes available. Combined with gridding 
the intrinsic parameters, this allows for a degree of paral¬ 
lelization which has not been achieved by other sampling 
methods in this parameter space. There is a practical 
limit to this strategy since there is a fixed start up time, 
but the degree of parallelization realized is at least an 
order of magnitude or more over current schemes. 


1. Adaptive Monte Carlo integration 

To better concentrate our samples in regions of high 
significance, we implemented an importance sampling al¬ 
gorithm using a simple adaptive Monte Carlo procedure. 
Every riadapt samples, we updated the sampling distri¬ 
bution based on measured weights w^. We choose to 
temper the distribution by raising the likelihood to an 
exponent which is chosen heuristically based on the net¬ 
work SNR and the particulars of the adaptive histogram. 
This choice attempts to mitigate the large dynamic range 
of C in Eq. |^to a scale comparable to the number of 
samples used in each histogram. At the end of every 
adaptation period, we reconstruct the one-dimensional 
sampling distributions in each adapting dimension, us¬ 
ing the last nadapt samples. In each dimension (6>), we 
subdivide the full range into nbins equal-length bins then 
evaluate a tempered, weighted histogram. Since the di¬ 
mensionality of the space is high, but sparse, there is a 
danger that early adaptations will have one-dimensional 
sampling distributions which suffer from fluctuations in 
other parameters after marginalization. To avoid this ef¬ 
fect and better ensure the full prior space is covered ade¬ 
quately, we then average the histogram W with a uniform 
distribution with weight s: 


f^a=5f^a + (l-5)/nbins • (34) 

Finally, we transformed from this discrete, bin-by-bin 
representation to a continuous integral by constructing 
a one-dimensional sampling distribution Ps,a{^a) by in¬ 
terpolating between bin centers, then constructing the 
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one-dimensional inverse Ps,a{< ^a)~^ by integrating 
Ps,a{^a) — Ps,a{^a)- The latter process ensures that 
the sampling distribution and inverse cumulative distri¬ 
bution used to generate random samples are normalized 
and self-consistent. 

As configured for this paper, we used ribins = 
100,nadapt = 1000, and s = 0.1. The adapted sampling 
distributions are frozen in after 10 ^ samples are drawn. 

The adaptive sampler was not used for all parameters. 
The parameters that are useful to adapt often have a 
significant dynamic range (i.e., distance) or could be rep¬ 
resented with more detailed prior information (i.e. sky 
location). Conversely, parameters whose marginalized 
posterior is expected to resemble the prior, like orbital 
phase and polarization, were not adapted as our one¬ 
dimensional method would provide no benefit. 


3. Time marginalization 

Having already computed the Qk,im{^Pk)^ we can 
evaluate the likelihood versus time jC{t) cheaply, by ar¬ 
ray addition operations. Hence, rather than performing 
Monte Carlo integration, we can likewise efficiently inte¬ 
grate over time. 

Specifically, for every sky location drawn by the adap¬ 
tive Monte Carlo routine, we compute the correspond¬ 
ing time shift between geocenter and each detector with 
Eq. (§. To evaluate the value of In C for some geocenter 
time tgeo, we simply look up the corresponding value of 
the Qk,im{^Pk) from our precomputed values and plug 
them into Eq. The resulting time series jC{t) is then 
numerically integrated using Simpson’s rule in a window 
of 300 ms, centered on tgeo- 


2. Using search results to target specific areas of the sky 


E. Intrinsic priors and posterior construction 


Gravitational-wave search pipelines have usually al¬ 
ready identified candidate event times in two or more in¬ 
terferometers’ data, typically to much less than the light 
crossing time of the earth. 

However, the initial search pipeline is not designed 
to provide estimates of the sky location, so the 
BAYESTAR pipeline rapidly processes the results of 
a gravitational-wave search to identify candidate sky lo¬ 
cations consistent with a gravitational-wave event, using 
only the information provided by the top-level search. 
This code produces a posterior distribution Pbs{(^^^) 
for the gravitational-wave signal sky location in a dis¬ 
cretized, interlocking, equal-area grid of pixels corre¬ 
sponding to sky regions, with probabilities for each pixel 
(a HEALPix^ skymap). The refinement of the grid is 
variable and scales with the resolution required to resolve 
the features of the posterior in a and 6. 

While we could allow the adaptive sampling to reduce 
the sky area required to be searched, we can also use 
the two-dimensional {a, 6) posterior from BAYESTAR as a 
sampling distribution to immediately target our Monte 
Carlo at a region of high support. This is more effective 
than the adaptive sampler because the one-dimensional 
sampling distributions, even if well converged to the true 
marginalized one-dimensional posteriors, over cover the 
two-dimensional space, thus limiting their effectiveness 
in sampling only where support truly exists. 

Eor simplicity, when using a BAYESTAR skymap, we 
adopt a purely discrete sky: the samples are selected from 
the set of sky region centers as calculated by the healpix 
library. In practice, the pixels are at a sufficient resolu¬ 
tion such that detrimental effects from the discretization 
are rarely noticeable. 


http: //healpix. sourcef orge. net/ 


At this point, we have evaluated >Cred(A) over a struc¬ 
tured grid of intrinsic parameters (here indexed by r) A^. 
By construction, the values >Cred(A) have small statistical 
errors (e.g., typically less than a few percent). We can 
then interpolate >Cred(^) throughout the sampled grid. 
Combined with the prior p{X) over intrinsic parameters, 
we evaluate the overall evidence Z and posterior distri¬ 
bution p{X) over intrinsic parameters via 


Z = j p(A)£,ed(A)rfA , 

(35) 

p{X) = |p(A)£,ed(A) . 

(36) 


To construct posteriors for the intrinsic parame¬ 
ters we adopt a uniform prior in mi,m 2 

with mi, m 2 G [1,2]Mq: inside the specified region, 
the (uniform) prior density is p{mi,m 2 )dmidm 2 = 
dmidm 2 /{MQ)^. Changing coordinates using the Ja¬ 
cobian d{mi,m 2 )/d{Mc,v) = 5MclM‘^ = 
where 6 = (mi — m 2 )/M = we find the prior 

density in coordinates is 


p{Mc,r])dMcdr] 


1 M.cdM.cdrj 
ATq 776/5^1 — 477 


(37) 


In other words, due to a coordinate singularity, the prior 
in 77 diverges near the equal-mass line. This coordinate 
singularity has a disproportionate impact on comparable- 
mass binary posteriors. 

However, unlike MCMC and nested-sampling codes, 
we can reweight our results, allowing the user to re¬ 
process the posterior using any user-specified intrinsic- 
parameter prior. As a result, our approach also enables 
several other calculations of practical interest: reanal¬ 
ysis given alternative astrophysical priors, simultaneous 
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analysis of multiple events, adapting the “prior” intrin¬ 
sic (mass, tide [711I73], cosmological parameter [74HZ6]) 
distribution to reproduce multiple observations, and si¬ 
multaneous independent constraints from multimessen¬ 
ger observations. 

To a good approximation, the intrinsic and extrinsic 
parameter distributions often separate after marginaliz¬ 
ing in time. In other words, after marginalizing in time, 
the extrinsic parameter distributions are nearly indepen¬ 
dent of A. In that case, each individual extrinsic param¬ 
eter distribution provides a reliable estimate of the pos¬ 
terior. As a crude approximation to the one-dimensional 
cumulative posterior distribution P{0), we simply aver¬ 
age all weighted samples Wr^q) from all mass points: 


Y,Wr,qQ{x - Xr,q) 

P{< x) = -=- , (38) 

/ V 

r,q 

where 0 is the Heaviside step function. 


V. RESULTS: RUNTIME DETAILS AND 
LARGE SCALE TESTS 


We present a proof-of-principle study of our parame¬ 
ter estimation pipeline, employed in a environment which 
should resemble the steps taken after the identification 
of search triggers. To this purpose, a mock data chal¬ 
lenge was constructed to exercise both the low latency 
gravitational wave searches as well as the parameter es¬ 
timation follow-up processes expected to be applied to 
gravitational-wave candidates identified by the search 
[33] . This challenge proceeded in several stages, each 
designed to emulate the anticipated workflow in a low la¬ 
tency binary neutron-star (BNS) detection environment. 

From the reported search pipeline results, we use the 
A4c and 77 coordinates from which the intrinsic search 
space is extrapol ated, an d the window for time marginal¬ 
ization (see Sec. IVD 3[ ) is formed around the reported 
coalescence time The search pipeline also provides 
reference power spectral densities S{f) utilized in evalu¬ 
ating the likelihood. 

Unless otherwise stated, the likelihoods were evalu¬ 
ated using nonspinning TaylorT4 templates at 3.5 post- 
Newtonian (PN) order, including only the (2, ±2) and 
( 2 , 0 ) modes. As we will discuss at length below, this 
signal model does not include all degrees of freedom 
permitted to the binary in the data. When evaluat¬ 
ing the likelihood, waveforms started at /low = 40Hz. 
The inner product integration Eq. (13) uses an in¬ 


verse power spectrum filter targeting the frequency range 
[/min,/max] = [/min, 2000Hz], constructed from the mea- 
sured power spectrum. 

Only distance used an adaptive sampling function, 
starting initially with a constant function. Our distance 
prior was uniform in volume out to D = 300Mpc. The 


declination and right ascension were sampled from the 
posterior provided by BAYESTAR. As used here, the 
sky map had 12 x 64^ pixels, roughly one per square de¬ 
gree. 


A. Ensemble of events 


We first present the agglomerated results from 450 
events drawn randomly from a set of recovered events 
in EH [77]. These events were also processed using 
BAYESTARand a subset were processed by the LALIN- 
FERENCE Monte Carlo sampler codes. All events have 
false alarm rate — as determined by the GSTlal search 
pipeline — smaller than one per century. This threshold 
is motivated by the selection criteria outlined in [26] . As 
in [ 33 ], this data set adopts a noise PSD set at the me¬ 
dian sensitivity from fiducial predictions for 2015 LIGO 
observing OIZHI- It is expected that the Virgo inter¬ 
ferometer will not be functional during this time period, 
leaving a two detector LIGO site network. 

A population of gravitational-wave signals from BNSs 
was added into the data, as described in [33| and reviewed 
here. Events in this set (the 2015 MDC data) were dis¬ 
tributed isotropically on the sky, and uniformly in volume 
out to 219 Mpc. The BNS injections had uniform ran¬ 
dom component masses in I. 2 M 0 — I.GMq and randomly 
oriented spins with the dimensionless magnitude not ex¬ 
ceeding 0.05. These (precessing) binary signals were gen¬ 
erated via precessing SpinTaylorT4 templates at 3.5 PN 
order [6], including all known post-Newtonian modes m- 

Eirst, the GSTlal BNS search was performed over 
the MDC set, identifying events for further follow up. 
Eor each event identified by the gravitational-wave search 
pipeline, we distributed the intrinsic points according to 
the procedure described in Sec. |IV C| and evaluated the 
Monte Carlo integral 10 times for each mass point, com¬ 
bining each of the runs together after. We outline the 
form of the priors and sampling functions for each pa¬ 
rameter in Table [IJ 

In addition to the information provided by the search 
pipeline, we also make use of the posterior probability 
of a and 6, provided by BAYESTAR. It is expected that 
such information will be available within a few minutes of 
trigger identification. The gains in time to convergence 
in the Monte Carlo integral are expected to outweigh the 
loss of time waiting for BAYESTAR to finish processing if 
the BAYESTAR posterior is used as a sampling function 
for the sky location. 

In order to validate the results of the pipeline, we 
present a comparison of the ensemble distribution of re¬ 
covered parameters versus their intrinsic distribution as 
determined by the prior — a so called PP plot — in Eig. 
[ 2 J If our estimates for the one-dimensional cumulative 
distributions P(< x) are unbiased and if is a random 
variable consistent with the prior, then P{x^) should be 
a uniformly-distributed random variable. To test this hy¬ 
pothesis, we use the one-dimensional posteriors provided 
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parameter 

physical prior 

initial sampling distribution 

distance (D) 

uniform in volume 

uniform in distance 

inclination (^) 

uniform in cos^, i between (— 7r,7r) 

same 

polarization (?/;) 

uniform in (0, 2 tt ) 

same 

orbital phase (0) 

uniform in (0, 27r) 

same 

declination ((5) 

uniform in cos (5, 5 between (— 7r,7r) 

drawn from BAYESTAR skymap 

right ascension (o) 

uniform in (0, 2 tt ) 

drawn from BAYESTAR skymap 


TABLE I: Prior distributions used for the extrinsic parameters in the 2015 MDC study. 



FIG. 2: For 450 randomly-selected NS-NS binaries, a plot of 
the cumulative distribution of Pe{0k) for each extrinsic vari¬ 
able 0 = D, a, S, (j). The curves represent the normalized 
fraction of events with estimated cumulative probability less 
than the injected parameter value. 


by the MDC. 

For each parameter x, each colored curve in Fig. |^is 
the fraction of events with estimated cumulative proba¬ 
bility P{< x^) at the injected parameter value x^. Specif¬ 
ically, if P{x^q) are the sorted cumulative probabilities 
for the q = 1.. .n events with P{x^i) < P(x* 2 ), then the 
points on the plot are {P{x^q),q/n}. 

There are a few possible sources of bias in the recovery 
of the posterior distribution. While the spins are nom¬ 
inally mild, it is possible that the use of non-spinning 
templates in the signal analysis for both the GSTlal and 
our parameter estimation codes could cause biases. Addi¬ 
tionally, the event selection process outlined in [33] intro¬ 
duces a small selection (Malmquist) bias which slightly 
disfavors edge-on binaries relative to our uniform prior. 
Our parameter estimation strategy does not account for 
selection biases; for a sufficiently large ensemble of events, 


small deviations between the statistical properties of our 
posteriors and the ensemble are expected. 


B. Marginalization details 


Two events were selected for demonstration. One event 
represents a marginal detection by the GSTlal pipeline. 
It is suggestive of what might be seen in the 2015-2016 
era, and is not a confident detection by itself. The second 
event is a “gold-plated” event, one which is an assured 
single-event detection by the GSTlal pipeline. They are 
qualitatively comparable in most regards, however, the 
stronger event (7^21091), being more strongly peaked in 
likelihood did not accumulate as many effective samples, 
and hence the error on the reduced likelihood (>Cred(A)) 
tends to be larger by an order of magnitude, yet still 
small in the relative sense. We provide the simulation 
and GSTlal event IDs so that the interested reader can 
cross-correlate results here with those obtained by other 
samplers in |33j. We will refer to them by their GST¬ 
lal IDs #21091 and #14631. Their properties are enu¬ 
merated along with the values obtained by the GSTlal 


pipeline and the recovered parameters in Table VB 


Each event was gridded in the elliptical region of 10% 
mismmatch in the same manner described in Sec. MU 
After unphysical points were thrown out, there were 128 
points for event #14631 and 124 points for event #21091. 
Each point had ten independent instances of the integra¬ 
tor applied, both to accumulate samples in parallel fash¬ 
ion as well as to cross-validate the results. The approx¬ 
imate wall clock time for any given instance was about 
45 minutes. 

The evolutio n of th e Monte Carlo integ ral evaluation is 
shown in Figs. |3(a) (#21091) and |3(b)| (#14631). This 
is for a single mass point corresponding to the center of 
the ellipsoid constructed for sampling the Mc^rj plane, 
and so also the mass reported by the GSTlal search. 
As explained in section |IV C| each mass point is inde¬ 
pendently evaluated 10 times. Each integral evaluation 
is shown in the top panel as a function of the number 
of samples drawn to that point. The blue regions sur¬ 
rounding each line are the error estimate at that point 
in the sampling. It is readily apparent that the integra¬ 
tors have sampled near the maximum po int w ithin a few 
thousand samples (middle panel, Figs. |3(a)| and |3(b)[ ), 
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FIG. 3: Figs, (a) and (b) are correspond to events #21091 and #14631, respectively. The top panel in each figure shows the 
estimated value (black lines) of the integral, for each of the ten integrators, through a given sample count, l-cr error regions 
are shaded in blue around the curve. Additionally, the value of the integral if all ten integrators are combined is shown in red. 
The middle panel shows the maximum value found by the integrator instance throughout the iterations. The last panel plots 
the number of effective samples (Eq. ( [3^ ) in black, with corresponding ordinate axis the right. In the same panel, the relative 
error (Eq. (31)) is plotted in blue, with corresponding ordinate axis on the left. 
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GSTlal Search ID #21091, Injection ID #26465 GSTlal Search ID #14631, Injection ID #30639 


Parameter 

Recovered 

Injected (GSTlal search) 

Recovered 

(parameter estimation) 

median / mode Parameter 

Recovered 

Recovered (parameter estimation) 

Injected (GSTlal search) median / mode 

mi (Mo) 

1.60 

1.68 

1.55 / 1.55 

mi (M©) 

1.32 

1.49 

1.49 / 1.49 

m2 (M©) 

1.40 

1.33 

1.54 / 1.54 

m2 (M©) 

1.28 

1.14 

1.13 / 1.13 

D (Mpe) 

68 

- 

59 / 67 

D (Mpe) 

102 

- 

85 / 101 

L 

3.0 

- 

1.3 / 0.47 

i 

3.09 

- 

1.53 / 2.76 

3 

-0.12 

- 

-0.06 / -1.02 

3 

0.28 

- 

0.28 / 0.91 

a 

4.00 

- 

3.9 / 0.82 

a 

1.89 

- 

2.39 / 1.83 


3.12 

- 

3.08 / 1.01 


2.32 

- 

3.14 / 3.33 


4.03 

- 

3.08 / 0.94 

0 

5.73 

- 

3.08 / 2.26 



Other parameters 



Other parameters 



Recovered 

Recovered 



Recovered 

Recovered 

Parameter 

Injected (GSTlal search) (parameter estimation) Parameter 

Injected (GSTlal search) (parameter estimation) 

Ixil 

0.026 

- 

- 

Ixil 

0.032 

- 

- 

|X2| 

0.008 

- 

- 

1X2 1 

0.036 

- 

- 

p 

16.28 

17.24 “ 

16.85*’ 

p 

11.06 

12.04^ 

11. 01^^ 


“with corresponding false alarm rate 3.8 X 10 
^as measured by 2^1n(max{Lj}) 


TABLE II: Injected and recovered intrinsic and extrinsic parameters for an injected signal. The GSTlal search only reports 
time of arrival, signal-to-noise ratio, and mass information. The parameters from our algorithm are quoted at a weighted 
median and mode of the marginalized posteriors. Some other parameters not considered in this study (e.g. the dimensionless 
component spin Xi, 2 ) are listed separately. 


and at this point the integrators all begin to converge 
towards the same evidence value. Within 10^ samples, 
most of the integrators have highly overlapping error re¬ 
gions, and only moderately small changes in the max¬ 
imum likelihood point searched. The relative error for 
each integrator is plotted in blue in the bottom panel, 
with the typical 1/^/N behavior exhibited. The “jumps” 
in the curves correspond to a new maximum being found 
in the integrand. The final relative error for any of the 
integrators is typically of order 5% or less for all cases 
(with only one 10% relative error for 7^21091), and typ¬ 
ically less than 1% for 7^14631. The red line in the top 
panel of the figures shows the evidence value if all ten 
integrators were joined as one integrator. 


Figs. 4(a)| (#21091) and |4(b)| (#14631) display the 


results of each of the ten integrator instances scattered 
in the A^eff, evidence plane. The dashed line indicates 
the value of the evidence as calculated across all sam¬ 
ples from all ten points. As can be noted, the error on 
the evidence values are small and most of the points are 
clustered near the total evidence dashed line, further in¬ 
dicating that all ten of the integrators have reached an 
internally consistent value, with very little spread around 
the average. These evidences are then averaged together 
to obtain the reduced likelihood for that mass point. A 
contour plot of the evidence for this event is shown in 


Figs. |5(a)| (#21091) and 5(b)| (#14631), with the ac¬ 


tual value oi Me and rj of the injected signal marked by 
the crosshairs and the value obtained by the GSTlal 
search pipeline is at the center of the ellipse. The ev¬ 
idence follows roughly the expected quadratic-shape of 
a Fisher-matrix manifold for these two parameters, but 
the maximum evidence point is slightly offset from the 


true value. We expect that this difference arises from the 
mild spin of the system biasing the result, as the wave¬ 
form family used to generate templates and calculate Cj-ed 
does not include the effects of spin. 


All possible sets of one and two-dimens ional posteriors 
over the parameters are shown in Figs. |6(a) (#21091) 
and |6(b)] (#14631). We also display here the posteriors 
for the intrinsic parameters (A4c and r]). The discreteness 
of the grid makes determination difficult for the compo¬ 
nent masse s, how ever, the Me and r] posterior (using 
the p rior in |IVE[) is consistent with the result shown in 
Figs. |5(a)| and |5(b) In both cases, the distinct degen¬ 
eracy between distance and inclination is clearly shown. 
In the case of #21091, a southerly inclination is weakly 
favored; this being an example of a bimodal distribution 
where the wrong mode is selected. The better sampled 
#14631 does choose the correct mode, but the degener¬ 
acy between the two is still quite strong, and the true in¬ 
clination is nearly exactly face on. Also apparent in both 
cases is the favoring of moderately off-axis binary con¬ 
figurations which tend to bias the distance measurement 
towards systematically closer values. The p{a, S) poste¬ 
rior, in effect the sky position, resembles the bayestar 
skymap which is used by our algorithm to select sam¬ 
ple points for these parameters. For #21091, there are 
two modes along the triangulation ring, corresponding 
to a maxima and its mirror image. The sampler was 
not able to resolve the degeneracy between the points for 
#21091^, but the true location of the event lies near one 


^ Though neither did the samplers in m- 
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(a) 



100 150 200 250 300 350 400 450 


(b) 

FIG. 4: Figs, (a) and (b) correspond to events #21091 and #14631, respectively. Each of the 10 integrator results is plotted 
against the reduced likelihood and number of effective samples collected (A^eff)- The error bars represent the uncertainty in the 
integral value from statistical error. The horizontal blue dashed line represents the value of the evidence averaged over all 10 
integrator estimates. 
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FIG. 5: Figs, (a) and (b) correspond to events #21091 and #14631, respectively. Level contours of the interpolated evidence 
surfaces in Me and r] space. The points are the values of Me, t], which were used to do the integral over the extrinsic parameters, 
and all 10 integrator instances have been averaged to get the final value. The injected value is marked with a large cross. 
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(a) 



(b) 


FIG. 6: Posteriors for event #21091, left and event #14631, right. All two-dimensional posteriors are represented as off-diagonal 
elements, and the diagonal elements are the one-dimensional fully marginalized posteriors. The true values of each parameter 
are marked with X’s in the two-dimensional plots and vertical lines in the one-dimensional plots. From left to right, the 
parameters are Me, ?7, l, distance, right ascension, declination, polarization, and coalescence phase. Note that the geocentric 
arrival time is omitted because it is marginalized analytically. 
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of the maxima on the ring. In the case of 7^14631, one 
mode is suppressed relative to the other, and the true 
location lies very near the maximum. Also notable is the 
degeneracy between orbital phase (0) and polarization 
angle The recovered posterior’s strong degeneracy 
both qualitatively agrees with first principles and sug¬ 
gests further performance improvements (e.g., via direct 
phase marginalization). 

C. Scaling 

For a quasicircular compact binary, it is well-known 
that the time to coalescence from a given gravitational- 
wave frequency scales as t{f) oc As the sensitivity 

of detectors improves at low frequencies, this requires the 
use of considerably longer waveforms for detection and 
parameter estimation. For example, the initial LIGO 
detectors were sensitive down to 40 Hz, while the ad¬ 
vanced LIGO detectors could be sensitive down to 10 Hz. 
To cover the extra low frequency portion would require 
waveforms that are ~ 40 times longer. 

Traditional Bayesian parameter estimation is compu¬ 
tationally limited by waveform generation and likelihood 
evaluations. Both of these are linearly proportional to 
waveform length. Note that the likelihood evaluations 
involve computing an inner product as in Eq. which 
is approximated as a finite sum. The number of points in 
the sum is determined by the length of the waveform and 
data being analyzed, which is why the cost of likelihood 
evaluations scales with waveform length. Therefore, one 
would expect the cost of Bayesian parameter estimation 
using a seismic cutoff of /min = 10 Hz to be roughly 
40 times more expensive than the same analysis using 
/min = 40 Hz. 

The method proposed here is not computationally lim¬ 
ited by waveform generation. Recall that for each point 
in the intrinsic parameter space we compute the wave¬ 
form and the inner products between the various modes 
and the data (the assorted Qk,im, Uk.im.Vm', Vk^im.Vm') 
only once. We then integrate over the extrinsic pa¬ 
rameters, which involves evaluating F+ + iFx and the 
for different values of the extrinsic parame¬ 
ters. While generating the waveform and computing in¬ 
ner products does scale with waveform duration, this cost 
is insignificant (even for /min = 10 Hz) compared to the 
integration over extrinsic parameters, which is wholly 
independent of waveform duration. Therefore, as illus¬ 
trated by Fig. The cost of our method increases only 
a little as /min is decreased, in contrast to the sharp in¬ 
crease that occurs for waveform-limited techniques such 
as traditional Bayesian parameter estimation. 

VI. CONCLUSIONS 

We have demonstrated a viable strategy for low- 
latency parameter estimation for long-duration binary 



FIG. 7: Points show the runtime of our parameter es¬ 

timation strategy as a function of the minimum frequency 
/min. For comparison, the solid curve shows the scaling 
oc expected if our runtime was proportional to the 

waveform duration (e.g., runtime proportional to the num¬ 
ber of time samples). Waveforms were generated using the 
standard TaylorTl time-domain code, with mi = I.SSM© 
and m 2 = I.23M0. 


neutron star signals, using an environment which resem¬ 
bles running conditions during the first advanced LIGO 
and Virgo observing runs. 

In the era of multimessenger astronomy, rapid 
and robust inference about candidate compact binary 
gravitational-wave events will be a critical science prod¬ 
uct for LIGO, as colleagues with other instruments per¬ 
form followup and coincident observations [26]. Low- 
latency sky localization [26l |38] , already provides reason¬ 
able accuracy by approximate methods like bayestar, 
so low-latency parameter estimation will further enable 
rapid electromagnetic followup and interpretation of can¬ 
didate gravitational-wave events. 

Motivated by the need for speed, we have introduced 
an alternative, highly-parallelizable architecture for com¬ 
pact binary parameter estimation. First, by using a 
mode decomposition (him) to represent each physically 
distinct source and by prefiltering the data against those 
modes, we can efficiently evaluate the likelihood for 
generic source positions and orientations, independent of 
waveform length or generation time. Second, by inte¬ 
grating over all observer-dependent (extrinsic) parame¬ 
ters and by using a purely Monte Garlo integration strat¬ 
egy, we can efficiently parallelize our calculation over the 
intrinsic and extrinsic space. Third, to target specific 
intrinsic (and extrinsic) parameters for further investiga¬ 
tion, we ingest information provided by the searches and 
BAYESTAR: the trigger masses and estimated sky posi¬ 
tion. Using standard time-domain waveforms in a pro¬ 
duction environment, we can already fully process one 
event in less than I hour, using roughly 1000 cores in 
parallel, producing posteriors and evidence with repro- 
ducibly small statistical errors. By dramatically decreas¬ 
ing the turnaround time for each analysis and by scaling 
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to harness all available resources efficiently, our strategy 
may significantly increase size and scope of parameter 
estimation investigations. 

Our implementation has bounded runtime - one hour 
is the worst case - so we know what resources will be 
needed to analyze a given NS-NS binary. Moreover, the 
parallel algorithm can exploit all available computing re¬ 
sources, without need for communication or coordination 
between jobs, allowing it to operate in computing envi¬ 
ronments with tightly constrained wallclock time. 

Our algorithm can also be immediately applied to any 
noise curve and existing time-domain model that pro¬ 
vides him^ at any mass, including EOBNRv2HM [62] and 
SEOB [61]. Einally, by construction our dominant oper¬ 
ation count cost is independent of the waveform’s length 
(or number of basis vectors). Hence, unlike reduced or¬ 
der methods, our code will run in nearly same amount 
of time now and with full aLIG0-scale instruments with 
/low ^ lOHz. 

A. Comparison with reduced-order quadrature 

Reduced-order quadrature methods provide an effi¬ 
cient representation of a waveform family and any inner 
products against it. Other authors have recently pro¬ 
posed prefiltering the data against the reduced-order ba¬ 
sis m, achieving significant speedup. Eor example, using 
TaylorE2 templates, EO] claim runtimes of order 1 hour, 
comparable to our end-to-end time in the high-precision 
configuration described above. 

Our strategy and reduced-order modeling achieve a 
similar speedup for qualitatively similar reasons: both 
strategies prefilter the data. In our algorithm, at each 
mass point, the data is prefiltered against a set of 
then efficiently reconstruct the likelihood for generic 
source orientations and distances. By integrating the 
likelihood at each mass point over all extrinsic param¬ 
eters, we are dominated by extrinsic-parameter sampling 
and hence not limited by waveform generation. 

B. Future work 

We first recognize that our sampling strategy, espe¬ 
cially the adaptation steps, have some deficiencies, es¬ 
pecially in light of earlier literature [S3 173 . We in¬ 
tend to fortify our current strategy by expanding it be¬ 
yond the independent product of one-dimensional sam¬ 
pling distributions. High-dimensional sampling distribu¬ 
tions can be approximated by kernel density estimators. 
This will allow us to better capture n-dimensional corre¬ 
lations with no appreciable loss of computational speed. 
This approach will also better allow us to incorporate 
more detailed information from other stochastic samplers 
to use as sampling distributions. Improvements to the 
BAYESTAR algorithm will also likely allow its direct inclu¬ 
sion as a sampler, instead of just using the data products. 


Eurthermore, the type of parallelization employed here is 
highly exploitable by GPU accelerated architectures. 

While the alternative architecture proposed here is ef¬ 
ficient and highly parallelizable over extrinsic parame¬ 
ters, all other parameters are currently suboptimally ex¬ 
plored. Eor example, the algorithm described and imple¬ 
mented here adopts a fixed, low-resolution grid to sam¬ 
ple two mass dimensions. All the compact binary specific 
searches expected to run in the next two years are also ca¬ 
pable of providing the specific portion(s) of the template 
bank that was triggered. We will soon be in a position 
to develop a more tightly hierarchical approach, and con¬ 
struct our mass grid to resemble those portions. While 
the method described here should generalize to a few ad¬ 
ditional dimensions, it is not yet clear what additional 
computational resources or architectural changes would 
be needed to apply our technique to all the intrinsic di¬ 
mensions (e.g. to include tides and component spins). 
These higher-dimensional problems are being addressed 
with Markov Chain or Nested Sampling codes, including 
tests of GR, models which include non-astrophysical en¬ 
vironmental effects m, and self-consistent electromag¬ 
netic and gravitational-wave parameter estimation. In 
the short-term future, we plan to explore straightfor¬ 
ward extension of the intrinsic grid with additional tidal 
parameters. Also under exploration is using stochas¬ 
tic banks: these types of banks have been used suc¬ 
cessfully with aligned-spin but otherwise generic binary 
coalescence searches. That said, several methods have 
been proposed for rapid waveform interpolation, includ¬ 
ing SVD and reduced-order methods. In the long run, 
we anticipate being able to perform Monte Carlo inte¬ 
gration over intrinsic dimensions as well, without being 
forced to adopt the relatively ad-hoc intrinsic/extrinsic 
split presented here. 

To provide a complete proof-of-principle illustration of 
our algorithm, we developed an independent production- 
ready code. That said, the standard lalinference pa¬ 
rameter estimation library in general and existing param¬ 
eter estimation codes (lalinference_mcmc and lalin- 
ference_nest) could implement some or all of the low- 
level and algorithmic changes we describe. Eor example, 
MCMC codes could implement our h/^-based likelihood, 
then de-facto marginalize over all extrinsic parameters 
by strongly favoring jumps at fixed intrinsic parame¬ 
ters (A). Any implementation which provides accurate 
marginalized probabilities (e.g., >Cred) can be parallelized 
across parameter space. We hope that by combining 
paradigms, future parameter estimation strategies can 
reach extremely low latencies, ideally of order a few min¬ 
utes, when advanced detectors reach design sensitivity. 
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